[SPARK-44718][SQL] Match ColumnVector memory-mode config default to OffHeapMemoryMode config value #42394

majdyz · 2023-08-08T11:45:00Z

What changes were proposed in this pull request?

Set the column vector default memory mode to depend on the off-heap memory mode flag. This is to prevent a user from using Vectorized-Reader with an on-heap column-vector by default when the off-heap memory mode is enabled on the cluster.

Why are the changes needed?

Avoid the unintentional usage of on-heap memory in vectorized-reader when off-heap memory mode is enabled by the user.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manual & existing tests.

…nfig value

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

HyukjinKwon

LGTM cc @cloud-fan @Ngone51 @jiangxb1987

majdyz · 2023-08-14T10:15:23Z

The failing CIs seems to be unrelated test failures.

cloud-fan · 2023-08-15T02:17:39Z

sql/core/src/test/scala/org/apache/spark/sql/ConfigColumnVectorModeDefaultSuite.scala

+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.test.SharedSparkSession
+
+class ConfigColumnVectorModeDefaultSuite extends SharedSparkSession {


it's probably too much to add a new test suite for it. We have tests for the config framework and .fallbackConf is already tested by ConfigEntrySuite. Shall we just remove it?

cloud-fan · 2023-08-15T10:09:26Z

The failure is unrelated, I'm merging it to master, thanks!

…ffHeapMemoryMode config value ### What changes were proposed in this pull request? Set the column vector default memory mode to depend on the off-heap memory mode flag. This is to prevent a user from using Vectorized-Reader with an on-heap column-vector by default when the off-heap memory mode is enabled on the cluster. ### Why are the changes needed? Avoid the unintentional usage of on-heap memory in vectorized-reader when off-heap memory mode is enabled by the user. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual & existing tests. Closes apache#42394 from majdyz/offheap-colvec-mode-default-value. Lead-authored-by: Zamil Majdy <[email protected]> Co-authored-by: Zamil Majdy <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

…ffHeapMemoryMode config value ### What changes were proposed in this pull request? Set the column vector default memory mode to depend on the off-heap memory mode flag. This is to prevent a user from using Vectorized-Reader with an on-heap column-vector by default when the off-heap memory mode is enabled on the cluster. ### Why are the changes needed? Avoid the unintentional usage of on-heap memory in vectorized-reader when off-heap memory mode is enabled by the user. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual & existing tests. Closes apache#42394 from majdyz/offheap-colvec-mode-default-value. Lead-authored-by: Zamil Majdy <[email protected]> Co-authored-by: Zamil Majdy <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit afcccb4)

…lumnVector.offheap.enabled's doc field ### What changes were proposed in this pull request? Followup of #42394 ``` * - spark.sql.columnVector.offheap.enabled - When true, use OffHeapColumnVector in ColumnarBatch. Defaults to ConfigEntry(key=spark.memory.offHeap.enabled, defaultValue=false, doc=If true, Spark will attempt to use off-heap memory for certain operations. If off-heap memory use is enabled, then spark.memory.offHeap.size must be positive., public=true, version=1.6.0). - <value of spark.memory.offHeap.enabled> - 2.3.0 ``` The doc field shall be interpolated by MEMORY_OFFHEAP_ENABLED.key instead of MEMORY_OFFHEAP_ENABLED. In this PR, we remove the doc redundant doc as it's also can be found in the `MEMORY_OFFHEAP_ENABLED.defaultValueString` ### Why are the changes needed? docfix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? manually debugging ### Was this patch authored or co-authored using generative AI tooling? no Closes #47165 from yaooqinn/minor2. Authored-by: Kent Yao <[email protected]> Signed-off-by: Kent Yao <[email protected]>

…lumnVector.offheap.enabled's doc field ### What changes were proposed in this pull request? Followup of apache#42394 ``` * - spark.sql.columnVector.offheap.enabled - When true, use OffHeapColumnVector in ColumnarBatch. Defaults to ConfigEntry(key=spark.memory.offHeap.enabled, defaultValue=false, doc=If true, Spark will attempt to use off-heap memory for certain operations. If off-heap memory use is enabled, then spark.memory.offHeap.size must be positive., public=true, version=1.6.0). - <value of spark.memory.offHeap.enabled> - 2.3.0 ``` The doc field shall be interpolated by MEMORY_OFFHEAP_ENABLED.key instead of MEMORY_OFFHEAP_ENABLED. In this PR, we remove the doc redundant doc as it's also can be found in the `MEMORY_OFFHEAP_ENABLED.defaultValueString` ### Why are the changes needed? docfix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? manually debugging ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#47165 from yaooqinn/minor2. Authored-by: Kent Yao <[email protected]> Signed-off-by: Kent Yao <[email protected]>

Match ColumnVector memory-mode config default to OffHeapMemoryMode co…

f410f3a

…nfig value

github-actions bot added the SQL label Aug 8, 2023

majdyz changed the title ~~Match ColumnVector memory-mode config default to OffHeapMemoryMode config value~~ [SPARK-44718] Match ColumnVector memory-mode config default to OffHeapMemoryMode config value Aug 8, 2023

HyukjinKwon reviewed Aug 9, 2023

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala Outdated Show resolved Hide resolved

HyukjinKwon changed the title ~~[SPARK-44718] Match ColumnVector memory-mode config default to OffHeapMemoryMode config value~~ [SPARK-44718][SQL] Match ColumnVector memory-mode config default to OffHeapMemoryMode config value Aug 9, 2023

Address comment

b167c1d

majdyz requested a review from HyukjinKwon August 9, 2023 08:24

HyukjinKwon approved these changes Aug 11, 2023

View reviewed changes

Update ConfigColumnVectorModeDefaultSuite.scala

7618da7

cloud-fan reviewed Aug 15, 2023

View reviewed changes

Delete ConfigColumnVectorModeDefaultSuite.scala

0069d4a

majdyz requested a review from cloud-fan August 15, 2023 04:07

cloud-fan approved these changes Aug 15, 2023

View reviewed changes

cloud-fan closed this in afcccb4 Aug 15, 2023

yaooqinn mentioned this pull request Jul 1, 2024

[SPARK-44718][FOLLOWUP][DOCS] Avoid using ConfigEntry in spark.sql.columnVector.offheap.enabled's doc field #47165

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-44718][SQL] Match ColumnVector memory-mode config default to OffHeapMemoryMode config value #42394

[SPARK-44718][SQL] Match ColumnVector memory-mode config default to OffHeapMemoryMode config value #42394

majdyz commented Aug 8, 2023 •

edited

Loading

HyukjinKwon left a comment

majdyz commented Aug 14, 2023

cloud-fan Aug 15, 2023

majdyz Aug 15, 2023

cloud-fan commented Aug 15, 2023

[SPARK-44718][SQL] Match ColumnVector memory-mode config default to OffHeapMemoryMode config value #42394

[SPARK-44718][SQL] Match ColumnVector memory-mode config default to OffHeapMemoryMode config value #42394

Conversation

majdyz commented Aug 8, 2023 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

HyukjinKwon left a comment

Choose a reason for hiding this comment

majdyz commented Aug 14, 2023

cloud-fan Aug 15, 2023

Choose a reason for hiding this comment

majdyz Aug 15, 2023

Choose a reason for hiding this comment

cloud-fan commented Aug 15, 2023

majdyz commented Aug 8, 2023 •

edited

Loading